Tirana County
Four of the Strangest AI Moments in 2025
Pillay is an editorial fellow at TIME. Albania's new AI-generated minister Diella speaks during the parliamentary session for the voting of the new government of Albania, in Tirana on Sept. 18, 2025. Albania's new AI-generated minister Diella speaks during the parliamentary session for the voting of the new government of Albania, in Tirana on Sept. 18, 2025. Pillay is an editorial fellow at TIME. It's been three years since the launch of ChatGPT gave hundreds of millions of people access to a kind of digital genie in their pocket--and things have been getting stranger by the month. Besides billions of AI-generated emails and the technology's widespread disruption of education and cognitive work, in 2025, some people began to fall in love with their AIs.
The World's First AI-Powered Minister Tests the Future of Government
Pillay is an editorial fellow at TIME. Albania's new AI-generated minister Diella speaks during the parliamentary session for the voting of the new government of Albania, in Tirana, on September 18, 2025. Albania's new AI-generated minister Diella speaks during the parliamentary session for the voting of the new government of Albania, in Tirana, on September 18, 2025. Pillay is an editorial fellow at TIME. In September, Albania appointed an AI system to a cabinet-level position--a world-first. Called Diella (Albanian for "sun"), the system was declared "Minister of State for Artificial Intelligence," and tasked by Albania's Prime Minister with addressing corruption in government contracting.
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Cahyawijaya, Samuel, Lovenia, Holy, Moniz, Joel Ruben Antony, Wong, Tack Hwa, Farhansyah, Mohammad Rifqi, Maung, Thant Thiri, Hudi, Frederikus, Anugraha, David, Habibi, Muhammad Ravi Shulthan, Qorib, Muhammad Reza, Agarwal, Amit, Imperial, Joseph Marvin, Patel, Hitesh Laxmichand, Feliren, Vicky, Nasution, Bahrul Ilmi, Rufino, Manuel Antonio, Winata, Genta Indra, Rajagede, Rian Adam, Catalan, Carlos Rafael, Imam, Mohamed Fazli, Pattnayak, Priyaranjan, Pranida, Salsabila Zahirah, Pratama, Kevin, Bangera, Yeshil, Na-Thalang, Adisai, Monderin, Patricia Nicole, Song, Yueqi, Simon, Christian, Ng, Lynnette Hui Xian, Sapan, Richardy Lobo', Rafi, Taki Hasan, Wang, Bin, Supryadi, null, Veerakanjana, Kanyakorn, Ittichaiwong, Piyalitt, Roque, Matthew Theodore, Vincentio, Karissa, Kreangphet, Takdanai, Artkaew, Phakphum, Palgunadi, Kadek Hendrawan, Yu, Yanzhi, Hastuti, Rochana Prih, Nixon, William, Bangera, Mithil, Lim, Adrian Xuan Wei, Khine, Aye Hninn, Zhafran, Hanif Muhammad, Ferdinan, Teddy, Izzani, Audra Aurora, Singh, Ayushman, Evan, null, Krito, Jauza Akbar, Anugraha, Michael, Ilasariya, Fenal Ashokbhai, Li, Haochen, Daniswara, John Amadeo, Tjiaranata, Filbert Aurelian, Yulianrifat, Eryawan Presma, Udomcharoenchaikit, Can, Ansori, Fadil Risdian, Ihsani, Mahardika Krisna, Nguyen, Giang, Barik, Anab Maulana, Velasco, Dan John, Genadi, Rifo Ahmad, Saha, Saptarshi, Wei, Chengwei, Flores, Isaiah, Chen, Kenneth Ko Han, Santos, Anjela Gail, Lim, Wan Shen, Phyo, Kaung Si, Santos, Tim, Dwiastuti, Meisyarah, Luo, Jiayun, Cruz, Jan Christian Blaise, Hee, Ming Shan, Hanif, Ikhlasul Akmal, Hakim, M. Alif Al, Sya'ban, Muhammad Rizky, Kerdthaisong, Kun, Miranda, Lester James V., Koto, Fajri, Fatyanosa, Tirana Noor, Aji, Alham Fikri, Rosal, Jostin Jerico, Kevin, Jun, Wijaya, Robert, Kampman, Onno P., Zhang, Ruochen, Karlsson, Börje F., Limkonchotiwat, Peerat
Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA languages. By involving contributors from SEA countries, SEA-VL aims to ensure better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages in VL research. Beyond crowdsourcing, our initiative goes one step further in the exploration of the automatic collection of culturally relevant images through crawling and image generation. First, we find that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing. Second, despite the substantial progress in generative vision models, synthetic images remain unreliable in accurately reflecting SEA cultures. The generated images often fail to reflect the nuanced traditions and cultural contexts of the region. Collectively, we gather 1.28M SEA culturally-relevant images, more than 50 times larger than other existing datasets. Through SEA-VL, we aim to bridge the representation gap in SEA, fostering the development of more inclusive AI systems that authentically represent diverse cultures across SEA.
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Muhammad, Shamsuddeen Hassan, Ousidhoum, Nedjma, Abdulmumin, Idris, Yimam, Seid Muhie, Wahle, Jan Philip, Ruas, Terry, Beloucif, Meriem, De Kock, Christine, Belay, Tadesse Destaw, Ahmad, Ibrahim Said, Surange, Nirmal, Teodorescu, Daniela, Adelani, David Ifeoluwa, Aji, Alham Fikri, Ali, Felermino, Araujo, Vladimir, Ayele, Abinew Ali, Ignat, Oana, Panchenko, Alexander, Zhou, Yi, Mohammad, Saif M.
We present our shared task on text-based emotion detection, covering more than 30 languages from seven distinct language families. These languages are predominantly low-resource and spoken across various continents. The data instances are multi-labeled into six emotional classes, with additional datasets in 11 languages annotated for emotion intensity. Participants were asked to predict labels in three tracks: (a) emotion labels in monolingual settings, (b) emotion intensity scores, and (c) emotion labels in cross-lingual settings. The task attracted over 700 participants. We received final submissions from more than 200 teams and 93 system description papers. We report baseline results, as well as findings on the best-performing systems, the most common approaches, and the most effective methods across various tracks and languages. The datasets for this task are publicly available.
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Singh, Shivalika, Romanou, Angelika, Fourrier, Clémentine, Adelani, David I., Ngui, Jian Gang, Vila-Suero, Daniel, Limkonchotiwat, Peerat, Marchisio, Kelly, Leong, Wei Qi, Susanto, Yosephine, Ng, Raymond, Longpre, Shayne, Ko, Wei-Yin, Smith, Madeline, Bosselut, Antoine, Oh, Alice, Martins, Andre F. T., Choshen, Leshem, Ippolito, Daphne, Ferrante, Enzo, Fadaee, Marzieh, Ermis, Beyza, Hooker, Sara
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artifacts that can distort the meaning or clarity of questions in the target language. A common practice in multilingual evaluation is to rely on machine-translated evaluation sets, but simply translating a dataset is insufficient to address these challenges. In this work, we trace the impact of both of these issues on multilingual evaluations and ensuing model performances. Our large-scale evaluation of state-of-the-art open and proprietary models illustrates that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge. Moreover, for questions requiring geographic knowledge, an astounding 84.9% focus on either North American or European regions. Rankings of model evaluations change depending on whether they are evaluated on the full portion or the subset of questions annotated as culturally sensitive, showing the distortion to model rankings when blindly relying on translated MMLU. We release Global-MMLU, an improved MMLU with evaluation coverage across 42 languages -- with improved overall quality by engaging with compensated professional and community annotators to verify translation quality while also rigorously evaluating cultural biases present in the original dataset. This comprehensive Global-MMLU set also includes designated subsets labeled as culturally sensitive and culturally agnostic to allow for more holistic, complete evaluation.
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages
Guo, Jia, Dou, Longxu, Zeng, Guangtao, Kok, Stanley, Lu, Wei, Liu, Qian
In this paper, we introduce SailCompass, a reproducible and robust evaluation benchmark for assessing Large Language Models (LLMs) on Southeast Asian Languages (SEA). SailCompass encompasses three main SEA languages, eight primary tasks including 14 datasets covering three task types (generation, multiple-choice questions, and classification). To improve the robustness of the evaluation approach, we explore different prompt configurations for multiple-choice questions and leverage calibrations to improve the faithfulness of classification tasks. With SailCompass, we derive the following findings: (1) SEA-specialized LLMs still outperform general LLMs, although the gap has narrowed; (2) A balanced language distribution is important for developing better SEA-specialized LLMs; (3) Advanced prompting techniques (e.g., calibration, perplexity-based ranking) are necessary to better utilize LLMs. All datasets and evaluation scripts are public.